33 research outputs found

    Combined burden and functional impact tests for cancer driver discovery using DriverPower

    Get PDF
    The discovery of driver mutations is one of the key motivations for cancer genome sequencing. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancers across 38 tumour types, we describe DriverPower, a software package that uses mutational burden and functional impact evidence to identify driver mutations in coding and non-coding sites within cancer whole genomes. Using a total of 1373 genomic features derived from public sources, DriverPower's background mutation model explains up to 93% of the regional variance in the mutation rate across multiple tumour types. By incorporating functional impact scores, we are able to further increase the accuracy of driver discovery. Testing across a collection of 2583 cancer genomes from the PCAWG project, DriverPower identifies 217 coding and 95 non-coding driver candidates. Comparing to six published methods used by the PCAWG Drivers and Functional Interpretation Working Group, DriverPower has the highest F1 score for both coding and non-coding driver discovery. This demonstrates that DriverPower is an effective framework for computational driver discovery

    Cancer LncRNA Census reveals evidence for deep functional conservation of long noncoding RNAs in tumorigenesis.

    Get PDF
    Long non-coding RNAs (lncRNAs) are a growing focus of cancer genomics studies, creating the need for a resource of lncRNAs with validated cancer roles. Furthermore, it remains debated whether mutated lncRNAs can drive tumorigenesis, and whether such functions could be conserved during evolution. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we introduce the Cancer LncRNA Census (CLC), a compilation of 122 GENCODE lncRNAs with causal roles in cancer phenotypes. In contrast to existing databases, CLC requires strong functional or genetic evidence. CLC genes are enriched amongst driver genes predicted from somatic mutations, and display characteristic genomic features. Strikingly, CLC genes are enriched for driver mutations from unbiased, genome-wide transposon-mutagenesis screens in mice. We identified 10 tumour-causing mutations in orthologues of 8 lncRNAs, including LINC-PINT and NEAT1, but not MALAT1. Thus CLC represents a dataset of high-confidence cancer lncRNAs. Mutagenesis maps are a novel means for identifying deeply-conserved roles of lncRNAs in tumorigenesis

    Analyses of non-coding somatic drivers in 2,658 cancer whole genomes.

    Get PDF
    The discovery of drivers of cancer has traditionally focused on protein-coding genes1-4. Here we present analyses of driver point mutations and structural variants in non-coding regions across 2,658 genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium5 of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). For point mutations, we developed a statistically rigorous strategy for combining significance levels from multiple methods of driver discovery that overcomes the limitations of individual methods. For structural variants, we present two methods of driver discovery, and identify regions that are significantly affected by recurrent breakpoints and recurrent somatic juxtapositions. Our analyses confirm previously reported drivers6,7, raise doubts about others and identify novel candidates, including point mutations in the 5' region of TP53, in the 3' untranslated regions of NFKBIZ and TOB1, focal deletions in BRD4 and rearrangements in the loci of AKR1C genes. We show that although point mutations and structural variants that drive cancer are less frequent in non-coding genes and regulatory sequences than in protein-coding genes, additional examples of these drivers will be found as more cancer genomes become available

    Pan-Cancer Analysis of Non-Coding Driver Mutations

    No full text
    Cancers are caused by genomic alterations known as drivers. As drivers have broad applications in precision oncology, their discovery has become one of the central motivations for cancer genomics. At present, the majority of drivers have been found in the ~2% protein-coding regions. Despite an intensive search for non-coding cancer drivers, however, only a few have been discovered to date. Here I describe DriverPower, a software package that uses mutational burden and functional impact evidence to identify drivers within cancer whole genomes. Using 1,373 genomic features, DriverPower's background model explains up to 93% of the regional variance in mutation rates across multiple tumour types. By incorporating functional impact scores, I further increase the accuracy of driver discovery. Comparing to six published methods, DriverPower has the highest F1-score for both coding and non-coding driver discovery. Applied to 2,583 cancer genomes from public sources, DriverPower identifies 217 coding and 95 non-coding driver candidates in well-defined genomic regions, including novel candidates like the SGK1 splice site, GPR126 enhancer and ALB promoter. To test whether the surprisingly low number of non-coding drivers is related to missing drivers in poorly-defined genomic regions, I investigate non-coding spliceosomal RNAs since protein-coding splicing factors are frequently mutated in cancer. Indeed, I found a highly recurrent A>C somatic mutation at the third base of U1 spliceosomal RNA across several tumour types. This mutation changes the preferential A-U base-pairing between U1 and 5′ splice site to C-G base-pairing, thereby creating novel splice junctions and altering the splice pattern of multiple genes, including known cancer drivers. Clinically, the A>C mutation is associated with alcohol abuse in hepatocellular carcinoma and the aggressive subtype of chronic lymphocytic leukaemia (CLL). The mutation also confers an adverse prognosis to CLL patients independently. This finding demonstrates the first non-coding driver in spliceosomal RNAs, reveals a novel mechanism of aberrant splicing in cancer and may represent a new target for treatment. Together, my research indicates that non-coding mutations play crucial roles in cancer, and future studies should focus on completing the cancer driver catalog and using it for precision oncology.Ph.D

    train_feature.hdf5.part1

    No full text
    Part 1/3 of training genomic feature

    train_feature.hdf5.part2

    No full text
    Part 2/3 of training genomic feature

    test_feature.hdf5

    No full text
    Genomic features for test elements (promoter, enhancer, CDS, UTRs

    train_feature.hdf5.part3

    No full text
    Part 3/3 of training feature
    corecore